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A model of voiced-sound generation is derived in which the detailed 
acoustic behavior of the human vocal cords and the vocal tract is computed. 
The vocal cords are approximated by a self-oscillating source composed of 
two stiffness-coupled masses. The vocal tract is represented as a bilateral 
transmission line. One-dimensional Bernoulli flow through the vocal cords 
and plane-wave propagation in the tract are used to establish acoustic 
factors dominant in the generation of voiced speech. A difference-equation 
description of the continuous system is derived, and the cord-tract system 
is programmed for interactive study on a DDP-516 computer. Sampled 
waveforms are calculated for: acoustic volume velocity through the cord 
opening (glottis); glottal area; and mouth-output sound pressure. Functional 
relations between fundamental voice frequency, subglottal (lung) pressure, 
cord tension, glottal area, and duty ratio of cord vibration are also deter- 
mined. 

Results shoiv that the two-mass model duplicates principal features of 
cord behavior in the human. The variation of fundamental frequency with 
subglottal pressure is found to be 2 to 3 Hz/cm H 2 0, and is essentially 
independent of voivel configuration in the programmed tract. Acoustic 
interaction between tract eigenfrequencies and glottal volume flow is strong. 
Phase difference in motion of the cord edges is in the range of to 60 degrees, 
and control of cord tension leads to behavior analogous to chest/ falsetto 
conditions in the human. Phonation-neutral, or rest area of cord opening, 
is shown to be a critical factor in establishing self -oscillation. Finally, 
the complete synthesis system suggests an efficient, physiological description 
of the speech signal, namely, in tenns of subglottal pressure, cord tension, 
rest area of cord opening, and vocal-tract shape. 

I. GENERATION OF VOICED SOUNDS IN SPEECH 

The vocal cords constitute the sound source for all voiced sounds 
in speech. The cords consist of opposing ligaments which form a con- 
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striction at the top of the trachea where it joins to the lower vocal 
tract. When air is expelled at sufficient velocity through this orifice 
(the glottis), the cords vibrate and act as an oscillating valve which 
interrupts the air flow into a series of pulses. These pulses of volume 
flow serve as the excitation source for the vocal tract in all voiced 
sounds, and the passive resonances of the vocal tract are excited by 
the glottal pulses. Voice quality and prosodic features of speech are 
therefore strongly dependent upon the properties of cord vibration. 
In the synthesis of speech by machines (for automatic voice response 
from computers, for example) it is desirable to make the synthetic 
voice as natural sounding as possible. Toward this end, it is necessary 
to understand the fundamental acoustic principles of voiced-sound 
generation and how these factors might be incorporated into a machine 
voice. Further, in a rather different area, the successful medical diag- 
nosis (and correction) of voice disorders depends upon an understanding 
of the critical parameters of vocal-cord behavior. As in the case of 
computer synthesis, medical diagnosis can be facilitated through an 
accurate and viable model of the human vocal cords. Applications 
such as these, together with fundamental interests in the acoustics of 
speech, provide a motivation for modeling the acoustic behavior of 
the vocal cords. 

II. SELF-OSCILLATING MODELS OF THE VOCAL CORDS 

The first quantitative self-oscillating model of the vocal cords was 
devised by one of the authors and implemented with a vocal-tract 
synthesizer on a digital computer. 1,2 This model was subsequently 
elaborated to include the mechanism of voiceless sound generation as 
well, 3 and was used for the synthesis of simple speech samples. 

In this early work, the vocal cords were approximated as a simple 
mechanical oscillator, composed of single opposing masses, springs, and 
nonlinear damping-that is, a so-called one-mass approximation of each 
vibrating cord. The oscillating masses were permitted only lateral 
displacement and were driven by a function of the subglottal pressure 
and the Bernoulli pressure in the glottal orifice. The heretofore much- 
used assumption of linear separability of sound source and vocal tract 
was not made, and acoustic factors such as voice pitch, waveform of 
glottal flow, and glottal duty factor were derived as self-determined 
functions of physiological parameters, namely, subglottal (lung) 
pressure, vocal-cord tension (or natural frequency), "neutral" glottal 
area, and vocal-tract shape. 
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The waveforms of glottal area and volume velocity obtained in this 
first study were similar to those observed in high-speed motion pictures 
of the human vocal cords and in inverse filtering of natural speech. 
Further, the results revealed how the acoustic interaction between the 
vocal cords and the vocal-tract shape (through its driving-point im- 
pedance) could influence the waveform and period of the glottal flow. 
Control of the physiological parameters, subglottal pressure, cord 
tension, neutral area, and vocal-tract shape, were shown to be sufficient 
for the synthesis of voiced and voiceless sounds. 3 

Although the one-mass model could produce acceptable voiced-sound 
synthesis and simulate many of the properties of glottal flow, it was 
inadequate to produce other physiological detail in vocal cord behavior. 
For example, the amount of acoustic interaction displayed between 
source and tract was greater than observed in human speech.* The 
one-mass model was congenitally incapable of sustained oscillation for 
a capacitive input load of the vocal tract-corresponding to oscillation 
at a frequency just above a formant (or eigen) frequency of the tract. 
Also, a physiologically-natural correlate of chest and falsetto registers 
and a phase-difference in the motion of the cord edges were lacking. 

To incorporate more physiological properties, multiple-mass repre- 
sentations of the cords were therefore considered. 4-6 The cord ligaments 
can be mechanically represented with as distributed a system as desired, 
i.e., periodic structures of masses, springs, and losses. However, theoret- 
ical work has shown that a two-mass approximation 6,7 can account 
for most of the relevant glottal detail, including phase differences of 
upper and lower edges and oscillation for a capacitive input impedance 
of the vocal tract. An initial effort at computer simulation of these 
factors 4 produced realistic phase differences and chest-falsetto dichotomy, 
but nonrealistic dependence on acoustic load. The difficulty lay in the 
equivalent circuit of the glottal orifice, the manner of its control, and the 
physiological data available for the simulation. 

The present work seeks a comprehensive and definitive treatment 
of the relevant acoustic theory and the existing physiological data. As 
in the earlier study, 2 simulation on an interactive DDP-516 laboratory 
computer is the tool by which the model is assessed and the unknown 
constants are estimated. In the sequel, the level of understanding and 
the realism attained bv the two-mass model will be discussed. 



* The amount of interaction is critically dependent upon the trans-glottal pressure 
distribution. In the first work, van den Berg's measurements of glottal pressure were 
used. 
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III. MECHANICAL RELATIONS FOR THE TWO-MASS MODEL 

The vocal cords are assumed to be bilaterally symmetric. The prop- 
erties of only one cord are therefore discussed, the same being implied 
for the opposing cord. A schematic diagram of the glottal system is 
shown in Fig. 1. The trachea, leading to the lungs, is represented by 
the pipe to the left. The larynx tube, leading to the vocal tract, is to 
the right. These tubes are assumed to be cylindrical in shape and are 
fixed in size. The glottis constitutes a constriction between these tubes, 
and the size of the constriction depends upon the cord displacement. 
The inlet to the glottal constriction occurs over the contraction dis- 
tance l c . Expansion back to the vocal-tract cross section occurs over 
the distance l t . Aerodynamic pressures relevant to the following 
discussion are indicated in Fig. 1. 

In the two-mass model, the vocal cord is divided in depth (thickness) 
into an upper and a lower part. Each part consists of a simple mechanical 
oscillator having a mass, spring, and damping (m, s, and r). The two 
masses of a cord, wi, and m 2 are permitted only lateral motion, x t and x 2 , 
and the masses are coupled by a linear spring, of stiffness k c , as shown 
in Fig. 1. Other factors shown in Fig. 1 are: 

l g the effective length of the vocal cords (or of the 

glottal slit), 
di and d 2 the thickness of mi and m 2 , respectively, 

Si and s 2 the equivalent springs, 

ri and r 2 the equivalent viscous resistances, 

A b01 and A b02 the cross-sectional areas of the glottal, slit when my 

and m 2 are at rest (i.e., the "phonation neutral" areas), 
U„ the average volume velocity across the glottal area. 
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CONTRACTION GLOTTIS EXPANSION 

Fig. 1 — Schematic diagram of the two-mass approximation of the vocal cords. 
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Owing to the assumption of bilateral symmetry, variations in cross- 
sectional areas due to the lateral displacements Xi and x 2 are doubled 
to give the total area variation; that is, the cross-sectional areas at the 
two masses are: 

A„l = A g0 l "T" 'il'gXl 

■Ag2 = AgQ2 I AlgXfr 

3.1 Nature of the Vocal-Cord Springs 

The function of the linear coupling spring, k c , is to represent an 
effect of flexural stiffness in the lateral direction of the vocal cords. 
This variable flexural stiffness results from varying the thickness and 
stiffness of the cords by action of the thyroarytenoid muscle (vocalis). 

The springs, s, and s 2 , are an equivalent representation of the tension 
of the vocal cords, which becomes firmer due to contraction of the 
anteriol cricothyroid muscle and other muscles. The springs, Si and s 2 , 
are given a nonlinear characteristic to conform to the stiffness as 
measured on fresh, excised human vocal cords. 8 The nonlinear relation 
between the deflection from the equilibrium position and the force 
required to produce this deflection is given by 

/.,- = fc,z,(l + W*J)i J = 1. 2, (1) 

where /,,- is the force required to produce z, , kj is the linear stiffness, 
and jj fc , is the coefficient describing the nonlinearity of the spring, s, , 
being positive in this case. 

During closure of the glottis, the model should satisfy realistic condi- 
tions at the colliding surfaces of the vibrating masses, m, and ra 2 with 
their opposing counterparts. A contact force at collision will cause 
some deformation in the flesh of the vocal cords. The restoring force 
at this deformation can be represented by an equivalent spring 
s hj (j = 1, 2). For simplicity, a nonlinear characteristic, similar to 
eq. (1), is assumed for the spring s hi , that is, 

n< = 4, + 4^){i + ,«(*, + ^)} (2) 

for 

Xi + A BOi /2l e ^0 i = 1, 2, 

where / Aj is the force required to produce the deformation to mass, to,- , 
during collision, /i, is the linear stiffness, and »//,, is a positive coefficient 
representing the nonlinearity of the contacting vocal cords. The resultant 
restoring force acting on w, during closure is, therefore, the sum of 
/,,- and f hj , that is, eq. (1) and eq. (2). This change in spring stiffness at 
closure is schematically illustrated in Fig. 2. 
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Fig. 2 — Characteristics of the nonlinear stiffnesses. 



3.2 Nature of the Vocal-Cord Losses 

As in the earlier formulation, 1 the viscous loss of the vibrating cords 
is assumed piece-wise linear. The loss is caused to increase step-wise 
on closure of the cords to represent the "stickiness" of the soft, moist 
contacting surfaces as they form together. 

It is convenient to express the equivalent viscous resistances in terms 
of damping ratios, f i and f 2 , for the uncoupled oscillators, where 

r x = 2f i VwA and r 2 = 2f 2 \/m 2 k 2 , (3) 

and where, as shown in eq. (1), fci and k 2 are the linear components of 
stiffness for the springs s t and s 2 . During the open-glottis condition, 
the loss is taken as ^ = 0.1 and f 2 = 0.6 for a typical condition of 
the cord model. As in the earlier work, the loss during the closed-glottis 
condition is taken essentially as critical damping, namely 



f! = (1.0 + 0.1) and f 2 = (1.0 + 0.6). 



(4) 



IV. PRESSURE DISTRIBUTION ALONG THE GLOTTIS 

Because of the small dimensions of the glottis (compared to a wave- 
length at the frequencies of interest), and because of the high velocity 
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of the glottal flow (compared to the vocal-cord velocity), we can assume 
the glottal flow to be quasi-steady. 7 We shall use the Bernoulli equation 
for one-dimensional flow to obtain the pressure distribution along the 
glottal flow. 

The abrupt contraction in cross-sectional area at the inlet to the 
glottis produces a vena coniracta surrounded by stagnant air. The 
vena contracta makes the inlet area A B i appear smaller than it actually 
is and the pressure drop greater than that dictated by an ideal area 
change. The loss factor for such a contraction has been studied in fluid 
flow experiments 9 and found to be on the order of 0.4 to 0.5. Flow 
measurements by van den Berg, et al., 10 on plaster cast models of the 
larynx set the loss figure at 0.37. This latter figure is therefore taken 
to estimate the pressure drop at the inlet, and we fix this drop at 

P*i(1.00 + 0.37), or OmptfJl/Alt), 

where P Bl = \pu 2 gi is the Bernoulli pressure, p the air density, and 
m b1 the particle velocity at the lower cord-edge. 

Within the constriction formed by the lower part of the cord, the 
pressure drop is assumed to be governed by viscous loss, which is also 
consistent with van den Berg's measurements. In this region the pressure 
falls linearly with distance according to a resistance to the volume flow 
equal to \2iid$/ A\ x , where /x is the shear viscosity coefficient. 

At the junction between the masses m, and m 2 , the volume flow U„ 
is continuous, but the particle velocity changes. There is a corresponding 
abrupt change in pressure equal to the change in kinetic energy per unit 
volume of the fluid. This pressure change at the junction is 

Ap = l/2p(i& - u o2 ) / g x 

= 1/2 P U 2 (1/A 2 2 - l/Al). 

Throughout the constriction formed by the upper cord-edge, m 2 , 
viscous loss is assumed to govern the pressure drop and, like the lower 
cord portion, the resistance is taken as (\2p.d 2 l 2 JAl 2 ). 

At the abrupt expansion of the glottal outlet, the pressure recovers 
toward the atmospheric value (assuming no constrictions in the rela- 
tively large vocal tract). Van den Berg, in his model flow measurements, 
found the pressure recovery to be about 0.5 P B • However, for small 
constrictions this measurement is difficult and uncertain. It seems 
preferable to base an estimate of the pressure recovery on momentum 
considerations, which hold in the theory of fluid flow. 

Consider at the sudden expansion the relations for Newton's law, 
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/ = (d/dt)(mv). Then, because U„ is continuous, 

P U ( Ul - u g2 ) = Ai(P„ - Pi) 
or 

(Pi - P 22 ) = l/2pu%[2N(i - N)] 

= l/2 P ^f[2N(l-N)] 

= P B2 [2N(1 - N)], (6) 

where N = A o2 /Ai , P B2 = l/2pw„ 2 , and A } is the input area to the 
vocal tract. The value of 2N(1 — N) is typically in the order of 0.05 
to 0.40, which is somewhat smaller than van den Berg's value. This 
difference is significant ■ to the acoustic interaction between the vocal 
tract and the cord source. 1 The pressure distribution along the steady 
flow through the glottis is indicated in Fig. 3. 

In the time-varying condition of the cords, the inertance of the air 
masses involved should also be taken into account. When combined 
with the loss terms just discussed, the pressure distribution along the 
glottis is described by 

p p _ io g&jj TT _l_ JOh ^Ze 

r u - f x2 - iz A ^ •*/. + Afi dt 
p _ p - £ r/ 2 (— - — 

p p 19 M$k tt , pd* dUo. 

r 2l - f 22 - \i A 3 2 u, + A ^ dt 

V. EQUIVALENT CIRCUIT FOR THE GLOTTIS 



(7) 



On the basis of the pressure difference relations of eq. (7), the acoustic 
impedance elements of the glottal orifice constitute the equivalent 
circuit shown in Fig. 4, where the U„ current is continuous. The elements 



* The (U (dL/dt)) term in (d/dt)(LU ) is negligible, where L = (pd/A). 
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Fig. 3 — Pressure distribution along the glottal flow. 



of the acoustic circuit are given by 
Rc = L37 2 A\ x ' 

A' A ' 



R vl = 12 



L.i = 



Rx2 ~ 2 \A 2 g2 



A* 



Iff. 



R„2 = 12 



A 3 ' 



-f 

J I) 

Ag\ 



pd 2 

A-al 



dx 
A c (x) 



^--sAt-ir 1 '- 1 - 



The total acoustic impedance of the glottis, Z a , is therefore 

R C L c R v . Lgi 

o v^j—nm^-T—^A^ — HPR^ 



A. 



T 
p.. 



Ri2 Rvz Lg 2 Re 

-t-AAA— i— W\ nm^-r-^AAr 

P|2 P21 p22 



(8) 



I ' /s — ' " ' 1 

(m,) boundary (m 2 ) 

CONTRACTION GLOTTIS EXPANSION 

Fig. 4 — Equivalent circuit for the glottis. 
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Z. = f |17. 



0.37 

A 2 



I _ 2 4^ ( 1 - — 



+ 



A, 



A, 



+ (i2.i + R.*) + i«(i.i + L g2 + L e ) (9) 



or 



z g = {R kl + fl fc2 ) | u, | + (fl.i + fl. a ) + MA,i + ff„ a + L e ), 

Z, - (Bh + Bm) I P. I + («.i + «.*) + MAn + £•» + L c ), (10) 
where 



B*l — 



0.19p 



Rk2 = 



4 Z ' *"*■ ~~ A 2 

In general, L c can be neglected in comparison to (L 0l + L g2 ). 

The glottal impedance relation of eq. (10) can be linked to that 
obtained in flow measurements by van den Berg et al. 10 Using the 
pressure recovery found by van den Berg for the glottal outlet, namely 
1/2 P B2 , [instead of the momentum relations in eq. (6)] gives a value 
R, = — (p/4) | U g I/M.J2 • For the case of A gl = A g2 = A g , the total 
glottal impedance becomes 



z„ = -0H7^ + 12^ + i«L,. 



2 Al ' ~ A g ' '""' • (11) 

The real part of this impedance is identical with that given by 
van den Berg. 



VI. MODEL SYSTEM FOR VOICED SOUNDS 



A network representation of the vocal system for voiced sounds is 
shown in Fig. 5. Beginning at the left, the subglottal system-comprised 
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Fig. 5 — Network model for the synthesis of voiced sounds. 
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of the trachea, bronchi, and lungs-is neglected and, as in the earlier 
study, 1 the subglottal pressure is approximated by a constant excess 
pressure in the lungs. Neglecting the subglottal system is also based on 
the finding that its first resonance is relatively high, with a mean value 
of 650 Hz and a bandwidth of 250 Hz. These values were determined 
from direct measurements of the subglottal driving-point impedance 
made on five laryngectomized subjects. 11 The 650 Hz figure is consider- 
ably higher than the value of 300 Hz reported by van den Berg. 

The vocal tract is represented in Fig. 5 as a transmission line of n 
cylindrical, hard-walled sections, the element values of which are deter- 
mined by the cross-sectional areas Ax • • ■ A n , and the cylinder lengths 
li • • • l n , 12 The inductances are L„ = pl„/2A n and the capacitances are 
C„ = (l n -AJpc), where c is the sound velocity. In the present work 
n = 4. 

To account in part for tract losses, serial resistances R n are taken 
to have the form of a viscous loss at the pipe wall, namely R„ = 
(S n /A 2 n )\/pnu/2, where S„ is the circumference of the nth section and 
(a is the radian frequency. The frequency for evaluation of this loss is 
fixed at the natural frequency of the lower oscillator, f = (1/2*-) Vfci/wii, 
and a multiplicative coefficient (ATT) is applied to increase the loss 
beyond that contributed by viscous loss at the walls and to produce 
formant band widths appropriate to a closed-glottis condition.* (The 
typical range for ATT is 20 to 25.) 

The transmission line is terminated in a radiation load equal to that 
for a circular piston in an infinite baffle, namely L R = 8p/3ir "s/wA n 
and R R = (128pc/97r 2 ^4 n ), where A„ is the final (mouth) area. 12 

From Fig. 5, the differential equations which relate the volume 
velocities of the system are: 

(Hoop) (R tl + fl«) 1*7.1 U. + (R rl + R v2 )U B + (L gl + L g2 ) d ' 



dt 



+ L, *jjf + B 1 U B + ^ JT' (U. - UJ dt-P. = 
(1-loop) (L, + U) ^ + (R> + R 2 )U Y +jrf (#1 - U 2 ) 



<it 



* Other vocal tract losses not included per se are those arising from non-rigid walls 
and from heat conduction losses at the wall. The former is quite significant in lower- 
formant damping. The latter is essentially negligible. See Ref. 12. 
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(2-loop) (L a + L 3 ) ^ + (R 2 + R 3 )U 2 + ^- jf ' (£/ 2 - C/ 3 ) d« 

+ jr [ (U a - U x ) dt = 

y->2 Jo 

(3-loop) (L 3 + L 4 ) ^ + (tf 3 + R*)U 3 + ^ Jf' (17, - U L ) dt 

^3 Jo 

(4-loop) (L 4 + L R ) ^ + RJJ L - L R ^ 

+ k Jf (Ul ~ Ua) dt = 

(5-loop) L R | (U R - U L ) + R R -U R = 0. (12) 

VII. FORCING RELATIONS FOR THE VOCAL-CORD OSCILLATOR 

The masses of the cord oscillator are driven by mean pressures acting 
on their exposed faces, namely, 

P., = Kft. + Pn) = P. - 1-37 | (g)' - | (*,17. + £„ *&) 

and 

P m2 = J(P« + p 22 ) - P»i - I {(5,1 + #, 2 )£/ 

+ (^ + W^}-I^-^> (13) 

The exposed areas are l,di and Z B d 2 , respectively. A shape of the cords 
is assumed such that the forces Ft and F 2 acting on wii and m 2 over 
their displacements Xt and x 2 are: 

x, z 2 FJl d t F a /l„ d 2 



Xi > x imla 


z 2 


> x 2miD 


P», 


P., 


#1 = #1 min 


•r 2 


> x 2min 


P„ 





X\ > X] m in 


%2 


^ »2min 


P. 


p. 


#1 ^ 3-1 min 


X a 


= 3-2 min 


P. 


o, 



(14) 
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where x lmiD = -(A g01 /2l g ), x 2min = -(A g02 /2l g ), and A g01 , A g02 are 
the "phonation neutral" values of the glottal area. The equations of 
motion for the two masses are therefore: 



f ' 2 
m 



l W + ri ~dt + Sl{Xl) + kc{Xl " Xi) = Fl 



m 2 ~ra 



IF + ? ' 2 It + S2(Xa) + fcc(:L ' 2 " Xl) = ^ 2 ' 



where 

A gl = (A o0 i + 2^), A fl2 = (A o02 + 2l g x 2 ), 

A 
s&i) = k^ + *«•*!), i = !» 2 > for »* > _ ~^ 

and 

for */ 2* " ^f ' < 15 > 

and Fx and F 2 are given by the force table of eq. (14). These equations 
are coupled to the flow equations through the fact that Xi and x 2 deter- 
mine A Bl and A g2 . Also note that the coupling between the masses, 
which are permitted only lateral motion, has been linearized to be 
proportional to (x 2 — Xi). [A more detailed consideration of the elonga- 
tion produced in the coupling spring by a displacement difference 
(x 2 — x^, and of the lateral component of restoring force, leads to 
modifying the coupling term to 2fc,.(^2 — £i) 3 /(di -f- d 2 ) 2 .] 

VIII. DIGITAL SIMULATION 

The differential equations are approximated by difference equations 
in which 

dj(t) _ 1(t t ) - /«,-,) _ /, - /,--! 
dt — {U - *,_,) T 

f at) dt s (u - *.-.) e m = T i Zf i . (i6) 

Jo 1=0 j'=0 

These discrete approximations applied to eqs. (12) and (15) yield: 
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(Hoop) (R kli + BkMt) \U ai \ U Bi + (R vli + R v2i )U gi 

+ (L ali + L g2i ) Uoi ~ T Uoi - 1 + L x Uoi ~ T Uai - 1 

+ R l -U. i + ~ E (I7 fl - t7„) - P. = 

W 1=0 

(1-loop) (i^t& + (Bl +iyW M -.^±^i7„_ 1 



r 



r 



»- / 2 1=0 W ;=0 

(2-loop) (^4^ + CB. + A)) u 2i - ^±L* Uu _ t 



T 



T 






c a 



(3-loo P ) {^^ + (fl, + fl 4 )| *7„ - ^^ C/s.-x 

+ tt L (t7 w - t/,,-) + £ £ Mm - 17,,) = 

W 1=0 ^3 1=0 

(4-loop) (^4^ + R<) U Li - ^4^ U L ^ 



T 

I* 

T 



T 



- % (£/«,• - ff„_o + ?r E (17*, - 17,,) = o 

W j=o 



(5-loop) |»{(J7„ - £/„•) - fl7„_, - ^,-x)} + £«,£/*,• = 0, 



where 



0.5 - 


A(l) V 1 


•4(1)/ 




J1 -<>2i-l 





0.19p p 



T — Elk t _ P&2 n , r, ,2 ^1 



Aoli-1 



A„2i-1 



A 6 



R,2i = 1211 



A 3 ' 



(17) 



and 
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£ (X U - 2x u -i + Xu-s) + m (*U - X U -l) 



+ S^Xii) + k c (x u -i ~ 32,-0 = F u 
-=§ (x 2i - 2.T 2 ,_ 1 + X 2 ,_ 2 ) + ^ (x 2i - .T 2t _j) 

+ s 2 {x 2i ) + k e (Xai-i - X U -i) = F 2i , 
where 



SiCr,,) = fci*(Zi< + qu'Xi<-i)i for Xii > — 



2L 



+ • *■(*„ +4f)l for .^^-if, 



fs(3s<) = ^2-fe.- + ij*2 # a;i»-i), for .r 2l - > — ~^F , 
s 2 (x 2i ) = k 2 -(x 2i + yti-xli-i) + /i 2 -"(U - 2, + "zf 1 ) 



Am\ I {•„„ M ^ -^-goz 



Fi,/« - Pi, = P. - 1-37 1 (^;) 2 

F„/« = P m2 , = P ml< - {|(B-ii + BrtO^.* 
+ CL.„ + l 9 ,) ^^} - | r/^ - ^). (18) 

These difference equations were programmed in Fortran IV and 
compiled for experiment on one of the DDP-516 laboratory computers 
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of the Acoustics Research Department at Bell Laboratories. 13 Simul- 
taneous solution of eqs. (17) and (18) yields all relevant volume veloc- 
ities, glottal areas, and displacement. The time derivative of the mouth 
volume velocity (i.e., through the radiation load) is a good approximation 
to the radiated sound pressure. 12 Time samples of all dependent variables 
are obtained by iterating the solution for as many samples as are desired. 

The sampling interval T is chosen as the longest interval that yields 
a stable solution to the difference equations. This interval is determined 
primarily by the time required for sound to transit the shortest length 
of the vocal tube. Because the distributed vocal tract is approximated 
as lumped constant T-sections, and because the behavior of these 
elements is further approximated by finite differences, the sampling 
interval T must be considerably shorter than the sound transit time 
through the shortest tube element. In the absence of appropriate 
sampling theory for this situation, the broad range of stable solutions 
was determined interactively on the DDP-516 computer and the 
longest stable interval used. In the present work, sampling rates in the 
range of 10 kHz to 30 kHz were used. 

The iteration "loop" of the equations can be closed owing to the 
manner in which the glottal impedance elements and the forcing func- 
tions are taken to involve samples of glottal area; for example, current 
values of impedance and forcing function involve only past values of 
glottal areas. The iteration, therefore, proceeds as follows. 

The cords and tract are initially assumed at rest, and initial currents 
are zero. The first sample of U gi is calculated from loop-<7 using A gt - 1 = 
A g0 (i.e., Xi-i = 0). The initial samples of all other loop currents are 
likewise calculated, out to the radiation load. The first sample of U gi is 
then used to calculate the first samples of the forcing functions and, 
from the mechanical equations, the first samples of the displacements 
x u and x 2i . The latter dictate new values of A gl and A g2 which are 
entered back into the glottal impedance elements for the calculation 
of the next sample of U„ and all other currents. The process is continued 
until as much of the solution as desired is obtained. 

For synthesis of continuous speech, the vocal-tract area values 
change as do the values of P, , A g0 , and cord constants.* These changes 
are slow by comparison to the sample variations in volume velocities, 
displacements, and pressure. The solutions for the continuously changing 
vocal system can therefore be considered as quasi-steady solutions of 



* As indicated in Fig. 5, a cord-tension parameter, Q, constitutes an input to the 
vocal-cord model. This parameter determines the mechanical constants of the oscilla- 
tor. 
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eqs. (17) and (18), and the mouth output samples taken as the synthetic 
speech signal. 

IX. PHYSIOLOGICAL CONSTANTS FOR THE VOCAL-CORD MODEL 

Few numerical values are available for the physiological parameters 
of the vocal cords. Using the sparse data available, the simulation on 
the DDP-516 computer was used to establish relevant ranges for the 
parameters. 

First, the range of parameters which allows self-oscillation of the 
model was studied for a uniform vocal tract, 16 cm long, 5 cm 2 in 
cross-section, and terminated in the radiation load. The DDP-516 
computer was used interactively to establish the self-oscillation region. 
The allowed oscillation range as a function of k 2 and k e is shown in 
Fig. 6. In this plot, the axes are normalized by the factor m 1 /m 2 k 1 . 
The parameters in the figure are the damping coefficients of the mechan- 
ical oscillators, f t and f 2 • For these cases, all other glottal parameters 



(a) 



J, A 2 =0.1/0.15 




f,/? 2 = 0.1/0.6 



:t>j 




k, = 5okdyn/cm 
d, = d 2 = o.iscm 
m, = m 2 = o.o75 g 
/e/ 



k, = 8okdyn/cm 
d, = 0.25 cm 
d 2 = 0.05 cm 
m, = 0.125 g 
m 2 = 0.025g 
/a/ 



Fig. 6 — Allowed regions of oscillation for the two-mass model. The parameter is 
the open-glottis damping ratio. The vocal-tract shape is for the vowel /a/. 
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are held constant at physiologically realistic values: namely, 
P, = 8 cm H 2 0, /„ = 1.4 cm, A g0l = A„ 02 = 0.05 cm 2 , thickness of the 
vocal cords d, + d 2 = 0.3 cm, total mass m 1 + ra 2 = 0.15 g, nonlinear 
coefficient of the springs »j fc i = j? 12 = 100 and t} hl = i} h2 = 500, and 
hi = Skx , h 2 = 3fc 2 • In particular, values for the spring constants are 
based upon measurements of static tensile stress versus displacement 
for an excised human larynx. 8 From these measurements, for example, 
T] k is deduced to be on the order of 50 to 100. 

For Fig. 6a, the vocal cords are divided into equal parts, with thickness 
and mass 0.15 cm and 0.075 g, respectively, and with fci = 50 kdyn/cm. 
For Fig. 6b, the lower part of the vocal-cord model is thicker than the 
upper part, that is, di = 0.25 cm and d 2 = 0.05 cm, and the masses, 
m t = 0.125 g and m 2 = 0.025 g, are chosen proportional to the thick- 
nesses, keeping the same total mass of 0.15 g as in Fig. 6a. 

Kaneko 11 has measured the damped oscillations of a fresh excised 
human larynx when excited by a mechanical impulse and with no air 
flow through the glottis. From this data, the damping ratio for the 
excised human cords can be estimated to be of the order of 0.1 to 0.2 
(which, incidentally, is the same order as deduced in the earlier simula- 
tions 1 ). This range of damping seems particularly appropriate for the 
bulk of the cords, that is, for m, of the model.* 

An acoustic load of the vocal tract, whose driving-point impedance 
has an inductive reactance at the fundamental frequency of the vocal- 
cord vibration, acts to enhance oscillation of the model. An increase in 
damping (loss) of the vocal tract at lower frequencies, as would be 
caused by wall vibration in the vocal tract, however, acts to oppose 
oscillation. Also, the tendency to oscillate is suppressed by an increase 
in the mechanical damping of w x and especially of ra 2 . 

The behavior of the vocal-cord model, calculated for values of k 2 
and k e specified by the small circle in Fig. 6b, will now be considered. 
This glottal condition is chosen as the "typical" one throughout the 
present study; namely, ki = 80 kdyn/cm, k 2 = 8 dyn/cm, and k c = 
25 kdyn/cm. 

* An equivalent damping ratio for the bulk of the cords can be estimated as follows: 

(n + n) = 2 r, v^r + 2 ft Vrnfa. 

For k c — * °°i 

( r , + r2 ) = 2 ft qui V(mi + m 2 )(ki + k 2 ). 

Substituting (for the "typical" conditions) ?« 2 = mi/5, fa = fci/10, ft = 0.1, and 
ft = 0.6 gives 

rcqui = ^= (Voo r. + h) = o.i6, 

which corresponds favorably with Kaneko's measurements. 
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X. RESULTS OF THE DIGITAL SIMULATION 

The vocal-cord and vocal-tract program, specified by eqs. (17) 
and (18), was used interactively on the DDP-516 computer to calculate 
waveforms of glottal flow, glottal area, and mouth sound pressure. 

10.1 Waveforms for Typical Glottal Conditions 

Measurements made at the typical glottal condition and for a uniform 
vocal tract are illustrated in Fig. 7. Waveforms of A Bl , A g2 , U, , and 
mouth sound pressure are shown for the initial 30 ms of voicing. The 
negative values of A gX and A g2 indicate glottal closure. (One can imagine 
the cords forming into one another upon contact, and the negative 
areas correspond to the continued displacement of the center of mass 
of the cords.) 

The results show that the phase difference between nil and m 2 is 
about 55 degrees, and the duty ratio (glottis open time to total period) 
is about 0.6. These values compare well with observations which have 
been made on human vocal cords by high-speed motion picture tech- 
niques 14 and by inverse filtering. 15 One notices the differences between 
the glottal area wave and the corresponding glottal flow wave, as 
pointed out in the earlier work. 1 The glottal flow wave is typically 
characterized by some temporal detail, asymmetry, and steep falling 
slope, while the area wave shows little temporal detail, is less steep, 
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Fig. 7 — Vocal-cord and vocal-tract functions computed from the DDP-516 
simulation. Glottal areas, A ai and /l„o, glottal volume velocity, U„, and mouth-output 
sound pressure are shown for the initial 30 ms of voicing for the neutral vowel /9/. 
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and tends to be more symmetrical. Because the cords are massive and 
are generally forced at a frequency above their natural frequency, their 
mechanical displacement does not reflect the detail of acoustic inter- 
action which the glottal flow displays. The sound output wave reflects 
the periodicity established by the cord oscillator, and the greatest 
formant oscillation (or excitation) typically occurs (with about 0.5 ms 
delay) at the closing phase of the U„ wave. This effect has been observed 
previously. 2 

10.2 Effect of Cord Stiffnesses 

The normalized k 2 versus k c plane of Fig. 6 is a convenient medium 
for demonstrating the effects of spring constants. Using this plane, 
waveforms of U g , A gl , and A a2 are sketched for corresponding stiffnesses 
in Fig. 8. As before, the vocal tract is a uniform pipe (/a/) and other 
glottal conditions are kept at their typical values. 

An increase in k c above the typical value reduces the phase difference 
between A gl and A g2 . It also diminishes the steep falling slope of the 
flow waveform, and tends to make the wave more symmetrical and 
triangular. An increase in k c also produces an increase in the build-up 
time required for the oscillation to settle to a steady state. For still 
larger values of k e , close to the bounds of the oscillation range, both 
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Fig. 8 — Sketches of cord-tract functions for points on the k 2 -k c plane. The axes 
are normalized by the function (m^kinh). The vowel is /a/. Compare with Fig. 6b. 
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the glottal flow and area waveforms become sinusoidal on a dc com- 
ponent, and the glottis does not close. 

The range of the sinusoidal behavior is expanded if the damping 
coefficients are made smaller. This special case is shown in Fig. 6a for 
the damping coefficients if hl = 0.1 and r\ k2 = 0.15. Here, k c has no 
limitation for the oscillation when k 2 is less than 20 kdyn/cm. Owing 
to the large k e , the two-mass model behaves just as the one-mass model 
in the extended region in which the oscillation is sustained by the 
inductive reactance of the vocal tract and glottis. This projecting tail 
disappears with an increase in the losses, either of the vocal tract or of 
the vocal cords. 

In contrast, an increase in k 2 , with other conditions constant, de- 
creases the amplitude of A o2 without a change of the phase difference. 
Further increase of k 2 leads to no closure of A b2 while A Bl can close 
completely during the cycle. Owing to the small amplitude of A g2 and 
its dc component, the glottal flow increases in upward roundness and 
also increases in duty ratio. A small, broad hump appears on the rising 
slope of the glottal flow wave, at which point the area A Bl is equal to A„ 2 . 

By comparison, a decrease in k 2 increases the amplitude of A g2 and the 
glottal waves tend to a symmetrical form. This same dependence on k 2 
and k c also occurs for the case of equal thicknesses, di = d 2 = 0.15 cm. 
A change in proportion of the damping coefficients, ^ and f 2 , also 
influences the relations between A Bl and A g2 . For example, the typical 
condition ^ =0.1 and f 2 = 0.6 produces an amplitude of A Bl slightly 
larger than that of A g2 for /a/, as seen in Fig. 7. A smaller value of f 2 
for the same values of f i and other parameters produces an amplitude 
of A b2 larger than A Bl without a change in phase difference. A steeper 
rising slope of the glottal area wave also results, but the falling slope 
remains unchanged. 

10.3 Effect oj Neutral Area 

The behavior of the vocal-cord model with respect to the "phonation- 
neutral" area, or the equilibrium value A„ Q , is another case where we 
can find correspondence between the complex behavior of the human 
vocal cords and the vibrations of the vocal-cord model. In human 
phonation the neutral area is maintained by laryngeal adjustment. 
Typical results from the simulation for different values of A g0 are 
illustrated in Fig. 9. These data were measured for the typical glottal 
conditions with f, =0.1 and f 2 = 0.6 and for the vowel /i/. One sees 
that the build-up time required for the oscillation to reach a steady 
state increases as A b0 gets larger. The value A b0 = 0.30 cm 2 surpasses 
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Fi g _ 9 — Effect of the "phonation-neutral" or rest area, A„o, upon the glottal area. 
The vowel is /if. 

a critical limit (about 0.25 cm 2 ) beyond which the model does not 
oscillate for these conditions. 

During the voicing build-up time the pitch period is much longer 
than that of the steady-state oscillation. The change in pitch at the 
onset of voicing is similar to the starting motion of the human cords 
when they are brought to the phonation position from an open position. 
In this instance, unestablished low subglottal pressure also contributes 
to the reduction of the fundamental frequency. The oscillation period 
before cord closure is a value between the damped natural frequencies 
of the two mechanical oscillators. This is consistent with the value 
calculated from the acoustic theory of the two-mass model neglecting 
the collision and the nonlinearity of the springs. 

Although the model, in this case, does not self-oscillate for A eQ > 
0.25 cm 2 , the maximum glottal area for phonation depends on the 
damping of the mechanical oscillators and of the vocal tract and on 
the subglottal pressure. For f, = 0.2 and f, = 0.6, and with P, = 
8 cm H 2 0, the maximum glottal area reduces to about 0.20 cm 2 . An 
increase in the phonation-neutral area also causes an increase in the 
amplitude of the vibration with no significant change in the period of 
the steady-state oscillation. 
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10.4 Effect of Tract Shape 

Excitation of the vocal tract by the cord model was studied for 
several vowels. Area waves, glottal flow, and mouth-output sound 
pressure are shown for the vowels /i/, /u/, and /a/ in Figs. 10a, b, 
and c. For all these cases, the typical glottal conditions hold (same as 
for /z/ in Fig. 7). 

One notices that the waveforms of glottal area and the fundamental 
frequency are almost independent of the vocal-tract shape, while the 
shape can substantially influence the waveform of the glottal flow, 
similar to the results obtained from the one-mass model in the earlier 
work. ' The acoustic interaction between the glottal flow and the acoustic 
load depends on the resonance characteristics of the vocal tract. Vowels 
having high resonant Q for the first formant show noticeable interaction 
in the glottal flow wave, as is seen for /a/. Also a low first formant can 
affect the glottal flow wave to a considerable extent, for example in /i/. 
However, the relatively large dissipation of the vocal tract in the 
frequency range of low first formants (such as for /i/ and /u/) caused 
primarily by vibration of the vocal-tract walls acts to reduce the 
interaction, but the glottal flow waveforms still differ markedly from 
each other. In all these cases, the tract losses are set to give bandwidths 
for the first formant equal to values measured on the human tract 
for the closed-glottis condition. 16 
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Fig. 10a — Results of the DDP-516 simulation for the vowel /i/ showing area waves, 
glottal flow, and mouth-output sound pressure. 
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Fig. 10b — Same as Fig. 10a for the vowel /u/. 
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Fig. 10c — Same as Fig. 10a for the vowel /a/. 



The data of Fig. 10* permit a comparison between the glottal wave- 
form and the speech pressure wave. The comparison is familiar from the 
results of inverse filtering. 1317 There is a delay time difference of about 
0.5 ms between the waves, corresponding to the time required for the 

* Sound spectrograms of the computed mouth-output sound pressure are shown for 
several vowels in Fig. lOd. 
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Fig. lOd — Sound spectrograms of the computed mouth-output sound pressure for 
the vowels /i, e, a, o/. 



sound to travel from the glottis to the lips. The waveforms for /a/, /i/, 
and /u/ show that the formants are excited largely at the closure of 
the cords. The output pressure waves attenuate rapidly with increasing 
glottal area during the opening phase of the glottal cycle. 

10.5 Effect of Subglottal Pressure 

The influence of the subglottal pressure on the fundamental frequency 
of the vocal-cord vibration is another important aspect of voice pro- 
duction. Typical behavior of the model for these factors is shown in 
Fig. 11. The nonlinear coefficient of the spring, rj k , is shown as the 
parameter for the vowel /a/. The data for the vowels /i/ and /a/ 
correspond to 7] k = 100 solely. For all these cases, the coefficient de- 
scribing the nonlinearity in the deformation of the vocal cords at 
closure is taken as ri h = b-q k . 

The slope of the fundamental frequency as a function of subglottal 
pressure is seen to be about 2.5 Hz/cm H 2 for tj* = 100, independent 
of the vowel configuration. This represents good agreement with mea- 
surements which have been made on human speech in the chest register 
by Hixon, et al. 18 The curve for t] k = 0, that is, linear springs, shows 
a saturation characteristic for subglottal pressures greater than 8 cm H 2 0. 
These results suggest that pitch variations with subglottal pressure 
might be ascribed to two causes. One is the collision of the vocal cords 
at closure when the amplitude of vibration is not too large and the 
subglottal pressure is less than several cm H 2 0. Another is the non- 
linearity of the deflection of the muscles and ligaments at large ampli- 
tudes of vibration and at subglottal pressures more than several cm H 2 0. 
In the latter case, the nonlinearity becomes dominant when large 
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Fig. 11 — Variation of fundamental frequency with subglottal pressure. The param- 
eter is the nonlinear coefficient of the stiffnesses, 774. 



displacement amplitude increases the effective stiffness of the springs. 
This tends to increase fundamental frequency. The minimum subglottal 
pressure for vowel production is about 2 cm H 2 0. 

In the earlier work with the one-mass model, significant influences 
were found on fundamental frequency as a function of tract configura- 
tion. This influence was due in large part to the pressure recovery 
assumed at the glottal outlet, namely 1/2 P B according to van den Berg's 
data. When the intraglottal pressure distribution derived here is used 
in the one-mass model, the interaction across vowels and with subglottal 
pressure is much less. 

The two-mass model becomes a one-mass model if k c is increased to 
a large value. For this condition, the variation in fundamental frequency 
with subglottal pressure is shown for several vowels in Fig. 12. The 
behavior is similar to the two-mass model. Under these conditions, the 
duty ratio of the former tends to be slightly greater than the latter. 

Duty ratio is another aspect of the model that can be compared to 
human phonation. An increase in subglottal pressure produces an 
increase in glottal flow and in glottal amplitude. Duty ratio (open time 
to total period) decreases for this increase in subglottal pressure and is 
asymptotic to a value around 0.5, as shown in Fig. 13. This value 
compares well with measurements on natural speech. 
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Fig. 12 — Variations of fundamental frequency with subglottal pressure for the 
one-mass and two-mass models. The parameter is vowel configuration. 

10.6 Effect of Cord Tension 

As in the previous studies, 1-3 it is convenient to apply a "tension 
parameter," Q, to control fundamental frequency. This can be achieved 
by causing the masses and thicknesses to be scaled down and the springs 
scaled up by the factor Q, causing the fundamental frequency to vary 
proportionally with Q. Phase difference, duty ratio, and glottal area 
waveforms are essentially uninfluenced by Q, and the amplitudes of 
glottal area and glottal flow decrease gradually with increasing Q. 
The glottal flow waveform also varies in detail depending on the funda- 
mental frequency, because the formants contributing to the temporal 
detail of the glottal flow are unchanged while the period of the glottal 
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Fig. 13 — Variation of duty ratio with subglottal pressure. 
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flow varies as a function of Q. Changes in flow waveform with pitch 
variation are greatest in cases where acoustic interaction is especially 
pronounced (such as for /a/). 

In human speech the duty ratio has a tendency to increase with the 
fundamental frequency. 19 This feature can also be given to the vocal- 
cord model by modifying the coupling-tension parameter k c to increase 
more than in linear proportion to Q. A variation as Q 2 appears more 
realistic. Physiologically this corresponds to the considerable decrease 
in compliance and thickness of the vocal cords when they are stretched 
by contraction of the cricothyroid muscle and other muscles associated 
with contracting of the vocalis. The increase of k c more than proportional 
to Q is equivalent to shifting the glottal operation condition on a line 
parallel to the abscissa in Fig. 6. As indicated in Fig. 8, a shift to the 
right reduces the phase difference and increases the duty ratio without 
changing other features of the cord vibration, except near the boundaries 
of the oscillation range. 

Behavior of the cord model with the Q parameter so defined is shown 
in Figs. 14 and 15. Variations in waveforms with Q are shown for the 
vowel /a/ in Fig. 14. The relations between fundamental frequency, 
duty ratio, and amplitude of glottal area with Q are plotted in Fig. 15. 
Variation of the duty ratio with frequency falls into the range measured 
in inverse filtering experiments. 19 
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Fig. 14a — Effect of tension parameter, Q, on cord-tract output for the vowel /a/. 
Q = 0.8. 
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Fig. 14b— Same as Fig. 14a with Q = 1.0. 
XI. INTERACTION EFFECTS WITH LARGE ACOUSTIC LOADS 

li.i Differences Between Two-Mass and One-Mass Models 

The measurements discussed previously show that the fundamental 
frequency and the area waveforms of the cord model are not strongly 
influenced by tract geometry. The interaction with glottal flow, however, 
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Fig. 14c — Same as Fig. 14a with Q = 1.5. 
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Fig. 15 — Effect of tension parameter (Q) on fundamental frequency (F), duty 
ratio (DR.), and glottal area (A g ). 



is marked. We have further investigated the effect of acoustic load by 
lowering the frequency of the lowest resonance of the acoustic load 
(the first formant) into the range of the fundamental frequency. This 
increases the driving-point impedance at the fundamental frequency 
and strong coupling between source and load is expected. 

The formant frequencies are lowered by lengthening the simulated 
vocal tract. Measurements of the fundamental frequency are shown in 
Fig. 16 as a function of the length of a uniform vocal-tract tube, 5 cm 2 
in cross-section. Data are shown for both the two-mass cord model and 
an equivalent one-mass model (k c — > «>). The measurements are for 
the typical glottal conditions. The shunt impedance of the vocal-tract 
wall (wall vibration) is not taken into account per se, and this effect 
is only approximated by an increase in damping for the first 16-cm 
section of the tube (as was used for the /a/ configuration). The remaining 
tube is regarded as an ideal hard-wall tube. The first resonance frequency 
of the vocal-tract tube, F 01 , is shown by the solid line. 

The frequency of the two-mass model decreases more gradually than 
that of the one-mass model with increasing the tube length. When the 
oscillation frequency of the former meets the first formant frequency 
of the vocal-tract tube, a sharp increase of the fundamental frequency 
occurs for further increase in tube length. The frequency returns to 



SYNTHESIS OF VOICED SOUNDS 



1263 



almost the same value as for a short tube. The frequency jump occurs 
at the resonant frequency of the vocal-tract tube, independent of 
dissipation and of glottal conditions. For example, an increase in 
acoustic dissipation of the vocal tract and a decrease in mechanical 
damping of m, and m 2 raises the onset frequency of the jump, but the 
frequency where the jump occurs is still the first resonant frequency 
of the tube. The variation of frequency with vocal-tube length is shown 
for two conditions of damping in Fig. 16. 

The curve of F oi as a function of tube length marks the boundary 
between an inductive driving-point impedance (to the left) and a 
capacitive driving-point impedance (to the right). The frequency jump 
for the two-mass model, which occurs at F nl regardless of the glottal 
conditions, places its new oscillation in the capacitive region, that is, 
between the first pole and second zero of the driving-point impedance. 

A frequency jump also occurs in the one-mass model. In this case, 
however, the jump is to the original frequency for which the driving- 
point impedance is still an inductive impedance, that is, between the 
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Fig. 16 — Variation of fundamental frequency with acoustic load for the two-mass 
and one-mass models. ^ i shows the frequency of the first pole of the driving-point 
impedance, and Fn' shows the first zero. 
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second zero and second pole of the driving-point impedance. This 
behavior can be predicted by an analysis of the oscillator with a uniform 
transmission line as a load. 

11.2 Effects of Acoustic Load on Human Voicing 

For comparison with the model behavior, we have measured similar 
loading effects in human voicing. To bring a first formant resonance 
into the range of the voice pitch, subjects phonated into a long metal 
tube the length of which was periodically changed from 39 cm to 73 cm 
by a motor (i.e., a bazooka-like sliding pipe). The subjects were in- 
structed to pronounce the sustained vowel /a/ at medium sound level 
and with constant glottal adjustment regardless of the change in tube 
length. Fundamental frequency (pitch) measurements were made at 
several frequencies in the chest register. Typical results for one subject 
are shown in Fig. 17. 

The voice pitch was measured at 10-ms intervals by a pitch-extracting 
program. 20 The length of the metal tube (exclusive of the subject's 
vocal tract) is also indicated on the abscissa along with the corre- 
sponding time scale for the length change. Adjacent open and closed 
points (circles or triangles) pertain to different cycles of the pipe in one 
set of measurements. One sees frequency jumps similar to those in the 
two-mass model. However, the observed onset frequencies of the jumps 
are generally higher than the resonant frequency of the compound 
tube consisting of the metal tube and the subject's vocal tract (neglecting 
the shunt impedance of the vocal- tract wall). The deviation from the 
resonant frequency becomes especially noticeable for lower frequencies. 

Toward an interpretation, it is known that the shunting impedance 
caused by vibration of the walls of the vocal tract produces a "cutoff 
frequency" of the sound transmission and constrains the lowest first 
formant frequency of the vocal tract. 21 This effect will contribute to 
raising the resonant frequency of the compound tube in a frequency 
range near the cutoff frequency. In the present instance, one could 
conceive of the walls of the cheeks, pharynx, and soft velum to yield 
to vibration because of the vocal-tract geometry for /a/ and because 
of the long wavelength. At the cutoff frequency of the vocal tract, the 
first resonance frequency of the combined vocal tract and metal pipe 
is essentially that of the metal pipe alone. The latter is shown in Fig. 17 
by the broken line. 

From Fig. 17, we can presume the cutoff frequency of the vocal tract 
for l%l to be a little lower than 200 Hz. The effect of the wall vibration 
could thus account for the rightward shift of the observed pitch jumps. 
The rightward shift is most noticeable at the lower frequencies as this 
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Fig. 17 — Fundamental frequency measurements made for a human subject when 
the acoustic load on his vocal cords is varied. The acoustic load is varied by peri- 
odically changing the length of a uniform tube fitted to the subjects' mouth. The 
broken line shows the first resonant (pole) frequency of the uniform tube. 

argument would predict. Even with these uncertainties, we see the 
close similarity in the dependence of fundamental frequency on acoustic 
load between the human larynx and the two-mass model.* It is further 
of interest that the vocal cords can self-oscillate without regenerative 

* Note added in proof: After this paper was written, we measured the "cutoff 
frequency" for the vocal tract and tube combination. We found its value to be 
195 Hz. 
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feedback from the subglottal and supraglottal system. In addition, the 
vibration of the soft walls of the vocal tract acts as a buffer to aid stable 
operation in the presence of coupling between the vocal cords and the 
vocal tract as the latter takes on a wide variety of shapes. 

XII. CONCLUSION 

The two-mass formulation of the vocal-cord model is seen to yield 
physiologically realistic behavior. In particular, the phase differences 
between upper and lower cord-edges corresponds well with motion 
observed in high-speed photography. The two-mass formulation also 
leads to a natural correlate to chest and falsetto register with coupling 
stiffness (lax in chest and tense in falsetto) being an important factor 
along with mass and thickness of the cords. 

The computer measurements show that the two-mass model is 
capable of oscillation just above the resonant frequencies of the acoustic 
load (i.e., the formant frequencies of the vocal tract), duplicating a 
capability of the human cords. The one-mass model cannot oscillate 
in this frequency range, where the driving-point reactance is capacitive. 
Further, the intra-glottal pressure distribution derived for use with the 
two-mass model yields cord-tract interaction similar to human speech. 
Fundamental frequency varies with subglottal pressure approximately 
as 2 to 3 Hz/cm H 2 0, and changes in vowel configuration do not markedly 
influence the fundamental frequency. Closures tighter than those which 
occur in vowel shapes (for example, at consonant- vowel boundaries) 
can of course influence the fundamental frequency. The improved 
intra-glottal pressure distribution is also applicable to a one-mass 
formulation, and it produces physiologically realistic cord-tract inter- 
actions with a one-mass model. 

The programmed cord oscillator and the digitally simulated vocal 
tract constitute a complete synthesizer for voiced sounds. The system 
so implemented has potential for speech synthesis applications such as 
computer voice response. Especially for techniques such as text syn- 
thesis, 22 the cord model and vocal tract offer means for natural control 
of tract and larynx parameters, i.e., subglottal pressure, cord tension, 
neutral area, and tract shape. These parameters appear sufficient for 
describing both voiced and voiceless sounds in continuous speech. 3 In 
some synthesis applications, the complexity of the two-mass model may 
not be needed and a simpler one-mass formulation may serve. In normal 
voice production, phonation occurs at a fundamental frequency always 
below the first vocal resonance (formant). Here, the driving-point imped- 
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ance is inductive and the one-mass oscillator performs acceptably, 
particularly with the improved intra-glottal pressure distribution. 

The two-mass model, because of its physiological detail, also provides 
a potential tool for medical analyses of voice disorder. Although the 
present simulation assumes bilateral symmetry of the opposing cords, 
asymmetric configurations can be implemented. The effects of defi- 
ciencies such as unilateral cord paralysis can therefore be investigated 
and quantified. Biomedical engineering is making increased use of 
digital simulations of physiological behavior. The simulation technique 
described here not only permits acoustic analysis of voice functions 
but of human respiration as well. 
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